One of the key features of SIMPLEGEN is the ability to generate
“transmission records” (see the SIMPLEGEN
vignette) where the transmission history of every infection is
tracked and recorded (eg mosquito to human transmission event, to next
mosquito blood meal and subsequent transmission event to another human
etc). Given we are not interested in simulating genetics in SLiM per se,
we are instead interested in the ability to generate tree-sequence
recording of such “transmission history”. We explore the possibility of
implementing a simple Ross-Macdonald epidemiological model with no
genetics in SLiM with tree-sequence recording below.
Model details
To build the Ross-Macdonald epidemiological model we use the
multi-species functionality of SLiM to create both human and mosquito
species. In this simple model, there are two equations that drive the
dynamics, as described by Aron and May (Aron and
May 1982) (also see Smith et al for a comprehensive review of
Ross-Macdonald model theory and an overview of the model notation in Box
2 (Smith et al. 2012)):
\(\frac{dx}{dt} =
mabx(1-x)-rx\),
where \(x\) is the number of
infected humans, \(m\) is the ratio of
mosquitoes to humans, \(a\) is the
human blood feeding rate or proportion of mosquitoes that feed each day,
\(b\) is the proportion of bites by
infectious mosquitoes that infect a human, and \(r\) is the daily rate each human recovers
from infection or infection clearance rate. The parameter \(ma\) is a measurable parameter
corresponding to the human biting rate or the number of bites by vectors
per humans per day.
\(\frac{dz}{dt} = ax(1-z)-gz\),
where \(a\) is the human blood
feeding rate or proportion of mosquitoes that feed each day, \(x\) is the number of infected humans, \(z\) is the number of infected mosquitoes,
and \(g\) is the instantaneous death
rate.
Both humans and mosquitoes can transition from a susceptible to
infected state and the population sizes of humans and mosquitoes are
fixed. We used the following parameters where those that were fixed are
indicated in bold (Table 1).
Table 1. Simulation parameters for Ross-Macdonald implementation in
SLiM
| \(N_h\) |
Number of human hosts |
1000, 10000, 100000, 1000000 |
|
| \(N_v\) |
Number of mosquito vectors |
50000, 500000, 5000000 |
|
| \(ma\) |
Human biting rate |
0.1 |
This is equivalent to a mosquito biting every 10
days |
| \(r\) |
Human infection clearance rate |
0.05 |
This is equivalent to a human clearing their infection
every 3 weeks |
| \(g\) |
Mosquito mortality rate |
0.2 |
This is equivalent to every mosquito having a
probability of survival of 0.8 (see (Hendry,
Kwiatkowski, and McVean 2021)) |
It is worth noting that the aim was not to fully parameterize the
model itself as we were more interested in profiling SLiM simulations in
terms of CPU time and memory usage.
We seed the model at day 50 with 100 infected individuals and the let
simulation run for 5, 10, 25 or 50 years for the following combinations
of \(N_h\) and \(V_h\):
\(N_h\)= 1000; \(V_h\)= 50,000
\(N_h\)= 10,000; \(V_h\)= 500,000
\(N_h\)= 100,000; \(V_h\)= 5,000,000
\(N_h\)= 1,000,000; \(V_h\)= 5,000,000 (note: for the 25 and
50 year simulations we did not run this)
There are no genetics included in this model but according to the
SLiM manual, strain properties infecting an individual could in theory
be “tracked” throughout the time series. Another possibility would be to
have a three-species model where Plasmodium spp is modeled
including host-parasitoid interactions (eg parasite density, parasite
strain ID tracking, for details see SLiM manual Chapters 19.3 and 19.4).
However, for our purposes even the multi-species SLiM model is not
suitable for tree-sequence recording outputs because this functionality
is only available for tracking within one species. In our case, the
ability to record tree “branches” between mosquito to human, and human
to mosquito are necessary to fully capture the ancestral “infection”
tree for downstream genetic analyses.
Results of profiling the model
We tested the speed of the simulations by varying human and mosquito
populations sizes (1e3 to 1e6 and 5e4 to 5e6, respectively), as well as
the number of years the simulation was run (5, 10, 25 and 50 years) as
described in [Model details].
By using the clock feature on the SLiM GUI we can profile each
simulation run, which outputs useful information like CPU time elapsed
during the run (we are interested in the “Elapsed wall clock time inside
SLiM core (corrected)”, which is measured in seconds) as well as average
memory usage across all “ticks” or generations (we are interested in the
“Average tick SLiM memory use”, which is measured in megabytes).
profile_stats <- data.frame(test= c("H=1e3, M=5e4", "H=1e4, M=5e5", "H=1e5, M=5e6", "H=1e6, M=5e6",
"H=1e3, M=5e4", "H=1e4, M=5e5", "H=1e5, M=5e6",
"H=1e3, M=5e4", "H=1e4, M=5e5", "H=1e5, M=5e6",
"H=1e3, M=5e4", "H=1e4, M=5e5", "H=1e5, M=5e6"),
pop_size_humans = c(1e3, 1e4, 1e5, 1e6,
1e3, 1e4, 1e5,
1e3, 1e4, 1e5,
1e3, 1e4, 1e5),
pop_size_mosquito = c(5e4, 5e5, 5e6, 5e6,
5e4, 5e5, 5e6,
5e4, 5e5, 5e6,
5e4, 5e5, 5e6),
years = c(5, 5, 5, 5,
10, 10, 10,
25, 25, 25,
50, 50, 50),
cpu_time = c(19.12, 270.03, 2440.64, 6544.94,
38.36, 501.69, 5210.31,
117.17, 1188.88, 22804.75,
200.18, 2715.99, 41697.93),
avg_memory = c(29.02, 192.06, 1800, 2100,
29.03, 192.11, 1800,
29.03, 192.14, 1800,
29.03, 192.14, 1800))
We found that the simulation run times were extremely slow,
especially for human population sizes above 1000 (Figure 1). Even for 5
year simulations the CPU time required was between 4.5 to 109 minutes
when the human population size was 10,000 up to the 1,000,000 value
tested. The average memory usage was consistent regardless of simulation
run time, but increased by an order of magnitude in line with increases
in human population size (Figure 1, see Figure 2 for interactive plot).
When varying the mosquito population sizes we found similar trends as
for increases in human population sizes (Figure 3).
(profile_stats %>%
ggplot(aes(x = pop_size_humans, y = cpu_time/60, group = factor(years), color = factor(years))) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = scales::pretty_breaks(n=6)) +
scale_y_continuous(breaks = scales::pretty_breaks(n=10)) +
scale_color_manual(values = c("hotpink3", "goldenrod3", "lightblue3", "olivedrab3")) +
xlim(0, 1e5) +
labs(x = "Human population size",
y = "CPU time (in minutes)",
color = "Simulation run time (in years)",
title = "Simple Ross-Macdonald model") +
theme_minimal() +
theme(legend.position = "none")) +
(profile_stats %>%
ggplot(aes(x = pop_size_humans, y = avg_memory, group = factor(years), color = factor(years))) +
geom_point() +
geom_line(alpha=0.5) +
scale_y_continuous(breaks = scales::pretty_breaks(n=5)) +
scale_color_manual(values = c("hotpink3", "goldenrod3", "lightblue3", "olivedrab3")) +
xlim(0, 1e5) +
labs(x = "Human population size",
y = "Average memory used (in megabytes)",
color = "Simulation run time (in years)") +
theme_minimal()) +
plot_layout(guides = "collect")
Below, the interactive plot shows CPU time and by hovering the
mouse you can see the exact values for each run.
ggplotly(profile_stats %>%
ggplot(aes(x = pop_size_humans, y = cpu_time/60, group = factor(years), color = factor(years))) +
geom_point() +
geom_line() +
scale_y_continuous(breaks = scales::pretty_breaks(n=10)) +
scale_color_manual(values = c("hotpink3", "goldenrod3", "lightblue3", "olivedrab3")) +
xlim(0, 1e5) +
labs(x = "Human population size",
y = "CPU time (in minutes)",
color = "Simulation run time (in years)",
title = "Simple Ross-Macdonald model") +
theme_minimal())
(profile_stats %>%
filter(test != "H=1e6, M=5e6") %>% # remove the test with human n=1e6 and m=5e6 as skews the trend visualization since two points have same mosq pop size
ggplot(aes(x = pop_size_mosquito, y = cpu_time/60, group = factor(years), color = factor(years))) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = scales::pretty_breaks(n=5)) +
scale_y_continuous(breaks = scales::pretty_breaks(n=10)) +
scale_color_manual(values = c("hotpink3", "goldenrod3", "lightblue3", "olivedrab3")) +
labs(x = "Mosquito population size",
y = "CPU time (in minutes)",
color = "Simulation run time (in years)",
title = "Simple Ross-Macdonald model") +
theme_minimal() +
theme(legend.position = "none")) +
(profile_stats %>%
filter(test != "H=1e6, M=5e6") %>% # remove the test with human n=1e6 and m=5e6 as skews the trend visualization since two points have same mosq pop size
ggplot(aes(x = pop_size_mosquito, y = avg_memory, group = factor(years), color = factor(years))) +
geom_point() +
geom_line(alpha=0.5) +
scale_x_continuous(breaks = scales::pretty_breaks(n=5)) +
scale_y_continuous(breaks = scales::pretty_breaks(n=5)) +
scale_color_manual(values = c("hotpink3", "goldenrod3", "lightblue3", "olivedrab3")) +
labs(x = "Mosquito population size",
y = "Average memory used (in megabytes)",
color = "Simulation run time (in years)") +
theme_minimal()) +
plot_layout(guides = "collect")